69 research outputs found
DYNIQX: A novel meta-search engine for the web
The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based search. Dyniqx integrates search results from search services of documents, images, and videos for generating a unified list of ranked search results. Dyniqx exploits the availability of metadata in search services such as PubMed, Google Scholar, Google Image Search, and Google Video Search etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are used by users to filter search results. Our preliminary user evaluation shows that Dyniqx can help users complete information search tasks more efficiently and successfully than three well known search engines respectively. We also carried out one controlled user evaluation of the integration of six document/image/video based search engines (Google Scholar, PubMed, Intute, Google Image, Yahoo Image, and Google Video) in Dyniqx. We designed a questionnaire for evaluating different aspect of Dyniqx in assisting users complete search tasks. Each user used Dyniqx to perform a number of search tasks before completing the questionnaire. Our evaluation results confirm the effectiveness of the meta-search of Dyniqx in assisting user search tasks, and provide insights into better designs of the Dyniqx' interface
Social Search with Missing Data: Which Ranking Algorithm?
Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods
The Open University at TREC 2007 Enterprise Track
The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University participated in the Expert Search and Document Search tasks of the Enterprise Track in TREC 2007. In both the document and expert search tasks, we have studied the effect of anchor texts in addition to document contents, document authority, url length, query expansion, and relevance feedback in improving search effectiveness. In the expert search task, we have continued using a two-stage language model consisting of a document relevance and cooccurrence models. The document relevance model is equivalent to our approach in the document search task. We have used our innovative multiple-window-based cooccurrence approach. The assumption is that there are multiple levels of associations between an expert and his/her expertise. Our experimental results show that the introduction of additional features in addition to document contents has improved the retrieval effectiveness
Modeling document features for expert finding
We argue that expert finding is sensitive to multiple document features in an organization, and therefore, can benefit from the incorporation of these document features. We propose a unified language model, which integrates multiple document features, namely, multiple levels of associations, PageRank, indegree, internal document structure, and URL length. Our experiments on two TREC Enterprise Track collections, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios
Recommended from our members
Integrating multiple document features in language models for expert finding
We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts' relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively
Recommended from our members
The Open University at TREC 2006 Enterprise Track ExpertSearch Task
The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University participated in the Expert Search task of the Enterprise Track in TREC 2006. We have proposed to address three main innovative points in a two-stage language model, which consists of a document relevance model and a cooccurrence model, in order to improve the performance of expert search. The three innovative points are based on characteristics of documents. First, document authority in terms of their PageRanks is considered in the document relevance model. Second, document internal structure is taken into account in the co-occurrence model. Third, we consider multiple levels of associations between experts and query terms in the co-occurrence model. Our experiments on the TREC2006 Expert Search task show that addressing the above three points has led to improved effectiveness of expert search on the W3C dataset
Dyniqx: a novel meta-search engine for metadata based cross search
The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based cross search. Dyniqx exploits the availability of metadata in academic search services such as PubMed and Google Scholar etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are used by users to filter search results. Our preliminary user evaluation shows that Dyniqx can help users complete information search tasks more efficiently and successfully than three well known search engines respectively
Exploiting semantic association to answer 'vague queries'.
Although today's web search engines are very powerful, they still fail to provide intuitively relevant results for many types of queries, especially ones that are vaguely-formed in the users own mind. We argue that associations between terms in a search query can reveal the underlying information needs in the users mind and should be taken into account in search. We propose a multi-faceted approach to detect and exploit such associations. The CORDER method measures the association strength between query terms, and queries consisting of terms having low association strength with each other are seen as vague queries. For a vague query, we use WordNet to find related terms of the query terms to compose extended queries, relying especially on the role of least common subsumers (LCS). We use relation strength between terms calculated by the CORDER method to refine these extended queries. Finally, we use the Hyperspace Analogue to Language (HAL) model and information flow (IF) method to expand these refined queries. Our initial experimental results on a corpus of 500 books from Amazon shows that our approach can find the right books for users given authentic vague queries, even in those cases where Google and Amazon's own book search fail
Abstract Mining Web Site Link Structures for Adaptive Web Site Navigation and Search
This thesis is concerned with mining the log file of a Web site for knowledge about the Web site and its users, and using the knowledge to assist users to navigate and search the Web site effectively and efficiently. First, we investigate approaches to adapting the organization and presentation of a Web site by learning from the Web site link structure and user behavior of the Web site. Approaches are developed for presenting a Web site using a link hierarchy and a conceptual link hierarchy respectively based on how users have used the Web site link structure. Link hierarchies and conceptual link hierarchies can be used to help users navigate the Web site. Second, we develop approaches for building a first-order Markov chain model of user navigation on the Web site link structure, link hierarchy, and conceptual link hierarchy respectively. Under a collaborative assumption, the model can be used for link prediction that assists users to navigate the Web site. Third, approaches are developed for ranking Web pages based on how users have used the Web site link structure. The page rankings can be used to help users search the Web site. The approaches developed in the thesis have been implemented in a prototype called Online Navigation Explorer (ONE). First, link hierarchies and conceptual link hierarchies are visualized in ONE. Second, link prediction using Markov chain models is integrated with link hierarchies and conceptual link hierarchies in ONE. Third, search results are visualized in ONE. Experimental results show that ONE can help users navigate a Web site and search for their desired information on the Web site effectively and efficiently. The work presented in the thesis is a step towards the development of an adaptive Web site, which can assist users to navigate the Web site and search for their desired information on the Web site. TABLE OF CONTENTS STATEMENT vi ACKNOWLEDGEMENTS vi
- …